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We introduce and study a new optimization problem called Hyper Vertex Cover. This problem is 
a generalization of the standard vertex cover to hypergraphs: one seeks a configuration of particles 
with minimal density such that every hyperedge of the hypergraph contains at least one particle. 
It can also be used in important practical tasks, such as the Group Testing procedures where one 
wants to detect defective items in a large group by pool testing. Using a Statistical Mechanics 
approach based on the cavity method, we study the phase diagram of the HVC problem, in the case 
of random regualr hypergraphs. Depending on the values of the variables and tests degrees different 
situations can occur: The HVC problem can be either in a replica symmetric phase, or in a one-step 
replica symmetry breaking one. In these two cases, we give explicit results on the minimal density 
of particles, and the structure of the phase space. These problems are thus in some sense simpler 
than the original vertex cover problem, where the need for a full replica symmetry breaking has 
prevented the derivation of exact results so far. Finally, we show that decimation procedures based 
on the belief propagation and the survey propagation algorithms provide very efficient strategies to 
solve large individual instances of the hyper vertex cover problem. 



I. INTRODUCTION AND MOTIVATION 

The vertex cover problem is one of the standard NP-complete problems [l| . It is also intimately related to spin glass 
problems in statistical physics, and has received a lot of attention in recent years from the physics community 2, 3, 4, 5] . 
In this paper we study a generalization of this problem to hyper-graphs, which we call the hyper vertex cover (HVC) 
problem. To the best of our knowledge this problem has not been studied before, at least not from a statistical physics 
perspective. The problem is easily stated: We consider a large population of N variables Xi, which can be either 
active {xi = 1) or inactive {xi = 0), and M function nodes (or tests), ta, and build up a random regular hyper-graph 
(or factor graph), i.e., a bipartite graph where each variable is connected exactly to L tests, and each test is connected 
exactly to K variables (with NL — MK). The function nodes enforce the constraint that at least one of the variables 
to which they are connected must be active. The case K = 2 reduces to the standard vertex cover (VC) problem. In 
this paper we will be mainly interested in the minimal HVC, i.e. a cover of the hyper-graph with the minimal possible 
number of active variables. 

The model can be also interpreted as a system of M Boolean clauses over the N variables which should be simulta- 
neously satisfied: each clause is an or function involving K randomly chosen variables, (e.g., Xi-^ V ... V a;^^ = 1) and 
they are such that each variable appears exactly in L clauses. Thus, the minimal HVC configuration corresponds to 
a pattern of the variables Xi which satisfies all the M clauses with the minimal number of ones. 

The interest of the present study is twofold. Statistical physics studies of the VC problem on random graphs have 
shown that, depending on the average degree of a variable in the graph, the problem is either simple (meaning that 
replica symmetry is unbroken), or very difficult (meaning that a full replica symmetry breaking - RSB - scheme is 
necessary). As the theory of full RSB is not well under control for 'finite connectivity' graphs (where the degrees 
of variables are finite), the difficult region is not fully understood, notwithstanding the recent progress made in 
Refs. @, H, As we shall see the HVC displays, for certain families of random graphs, an intermediate situation 
which is both non-trivial, because RSB is needed, but under control, because the solution is given by a first order RSB 
(IRSB), as developed for finite connectivity problems in [3]. Therefore this model joins the family of well controlled 
hard combinatorial optimization problems in which IRSB is supposed to give exact results, a family which includes 
if -satisfiability 0,11, @, graph coloring random Boolean equations [12], \-vii-K satisfability [l^, and lattice 
glasses 

On the other hand, there are several applications of HVC to important practical "real-world" tasks. In particular, 
HVC is closely related to the Group Testing (or Pool Testing) procedures The object of the Group Testing is 

to identify an a priori unknown subset of a large population of N variables, called the set of active (or defective) items, 
using as few queries as possible. Each query (or test) is connected to a certain subset of K items, and informs the 
tester about whether or not the subset contains at least one active item. A negative answer implies that all the items 
of the subset are inactive. This approach is used in many different applications, beginning with an efficient blood 
testing procedure [I^ ■ Other applications include quality control in product testing [1^ , searching files in a storage 
systems [2^, efficient accessing of computer memories, sequential screening of experimental variables ^T^, and many 
others, such as the basic problem of DNA libraries screening, which is very important in modern biological applications 
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such as mono-local antibody generation. Objectives of group testing range from finding an optimal strategy with the 
minimal number of tests, to devising an efficient algorithm able to reconstruct the pattern of active items. 

In the Group Testing problem, it is easy to first identify the variables which are sure zeros (the ones which are 
connected at least to one negative test). If one now considers the reduced graph obtained from the original GT 
problem by removing these sure zero variables, as well as all the negative tests, one obtains a reduced graph (with 
fluctuating degrees). The problem of identifying the active items in this graph is exactly the HVC problem. Therefore, 
the study of the phase diagram of the HVC could give useful insights, for example, to understand which is the best 
reconstruction algorithm for the pattern of active items to be used depending on the topological properties of the 
factor graph. 

In this paper we first present the study of the phase diagram of the random HVC problem, where each variable 
appears in L tests and each test involves K variables, based on the cavity method @ (these results could also be 
obtained in the framework of the replica approach [1,[23])- We show that depending on the values of the variables and 
tests degree different situations can occur: The minimal HVC problem can be either in a replica symmetric (RS) phase 
or in a one-step replica symmetry breaking (IRSB) phase. In these two cases we give explicit results on the minimal 
density of active items, and the structure of the phase space. On the other hand, there are also cases (like, e.g., the 
ordinary VC with K = 2 and L > 3), where a higher order RSB pattern is needed; these are more difficult problems 
that we don't address in this paper. The summary of the situation for the various values of K and L is contained 
in Figured! We then introduce the survey propagation (SP) and the belief propagation (BP) t ype algorithms, the 
analog for this problem of the ones which have turned out to be so efficient in iiT-satisfiability We show that 
a decimation procedure based on the surveys turns out to be an efficient way to solve large instances of the HVC 
problem. 

The paper is organized as follows: In the next section we define the problem and we introduce the statistical 
mechanics formulation; In Sec. IIIII we develop the cavity approach and work out the RS solution; In Sec. II VI we focus 
on the IRSB solution for the phase diagram, on the minimal HVC limit and on the stability of the IRSB approach 
with respect to further breaking of the Replica Symmetry; Sec. |V] explains the use of the BP and SP algorithms, and 
their application with a decimation procedure, to solve individual large instances of the problem; Finally, in Sec. IVII 
we discuss the results found and conclude this work. 

II. HYPER VERTEX COVER: DEFINITION AND STATISTICAL PHYSICS FORMULATION 

We consider a factor graph containing N + M vertices: N of them are associated with variables, we shall label them 
by ... G {!,..., N}. The other M are associated with function nodes (or tests), denoted by a, b... G {!,..., M}. 
An edge i — a between variable i and vertex a is present in the factor graph whenever variable i appears in test a. 
The factor graph is bipartite. 

The HVC problem is the following: each variable i can be either inactive {xi = 0) or active {xi = 1). We request 
that, for each of the M tests, at least one out of the K variables connected to it be equal to one. The optimization 
problem (minimal HVC) consists in finding a configuration that satisfies all these constraints and has the smallest 
value Aniin of the "weight" : 



One is also interested in knowing the number of configurations which satisfy all constraints, when A has a fixed value 



We shall use the following statistical mechanics formulation of the problem. Given an instance of the problem, 
characterized by a factor graph, we introduce the set of admissible configurations, C, which are all configurations of 
the N variables such that, for each clause a, at least one variable connected to a takes value 1. Denoting by da the 
variables which enter in clause a (i.e. the set of neighbors of a in the factor graph), we thus impose Oieaall ~ ^i) ~ ^■ 
The Boltzmann-Gibbs measure (canonical ensemble) is defined as P{{xi}) = [1/ Z)e~'^^'^^^'-^\ The chemical potential 
/i controls the overall density of ones p = '^^Xi/N. The minimal HVC problem is recovered in the — > oo limit, 
where the Boltzmann-Gibbs measure concentrates on configurations with the smallest number of active variables. We 
are also interested in understanding the properties of the microcanonical measure P™'^ which is the uniform measure 
on the admissible configurations {xi} with a fixed weight A{{xi}) = A, whenever such configurations exist. These 
properties will be studied hereafter through a detour to the canonical ensemble, using ensemble equivalence. 

The partition hmction of the model reads: 
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FIG. 1: Sub-graph rooted in i in absence of the neighboring test a (on the left). The function nodes b belong to the neighborhood 
of j {b £ di \ a) and send the messages vi,^i to the site i. The site i sends the message hi^a to the function node a, according 
to Eqs. ((51 . Analogously, on the left we have sketched the sub-graph rooted in the function node a in absence of the edge with 
the variable i. The sites j belong to the neighborhood of a {j £ da\i) and send the messages hj^a to the function node a. 
According to Eqs. ([Sjl, a sends the message Va~^i to the site i. 



where F is the free energy and S is the entropy. 

In the following we shall be particularly interested in the random L, K HVC problem, in which the factor graph is a 
random regular (L, K) bipartite graph, uniformly chosen from all the possible graphs where each variable is connected 
to L tests and each test is connected to K variables. 

The thermodynamic limit is taken by letting the number of variables N and the number of constraints M go to 
infinity with a fixed ratio a = M/N, keeping the degrees K and L fixed. In the following we use the cavity method 
which allows to write down iterative self-consistent relations for local expectation values, which are exact on a loop- 
less factorized graph (i.e., a tree). For any finite values of and M the graph is only locally tree- like, and it has 
loops whose average length is expected to scale as InA^. Therefore the cavity method is expected to provide good 
approximations for large samples and to become exact in the thermodynamic limit. 



III. CAVITY APPROACH AND REPLICA SYMMETRIC SOLUTION 



Given a graph and a variable i, we consider a sub-graph rooted in i obtained by removing the edge between i and 
one of its neighboring tests, a. Define Zq*^"-' and z'^^°^ as the partition functions of this sub-graph restricted to 
configurations where the variable Xi is respectively inactive (xi — 0) or active (xi = 1). Assuming that the sub-graph is 
a tree (which is generically correct, when one takes the large iV limit, up to any finite depth), these restricted partition 
functions can be written recursively (see Fig. [T]): consider the function nodes h belonging to the neighborhood di of 

i. For each b £ di \ a, consider the restricted partition functions F^*"^*) and yj''^^'' on the rooted branches of the 
graph starting from b, which arc defined, respectively, as the total partition function of the branch, and the partition 
function of the branch restricted to configurations in which at least one of the remaining K — 1 variables connected 
to the test is active. 

One obtains the following recursion relations: 

bedi\a 

Zo^"^ = n ^i'"'^ (3) 

b^di\a 

Analogously, one can express and Y^""^^^ in terms of the restricted partition functions Zq"*^"' and z[''^'^^ for 

j £ da \ i: 



^0 



j^da\i j^da\i 

y(a-0 = Yl (4^'^"' + Zp^"') (4) 



jQda\i 
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It is now convenient to introduce two local cavity fields on each edge of the graph, defined as: 



z 



and e'^'"""*' = y^"^ /Y'^°'^'^\ Basically, e'^'*'^" measures the local probability that the 
variable i is active in absence of the link with the test a. In terms of the local cavity fields, the recursion relations, 
Eqs. © and (g]), read: 



exp(-/x) 



exp(-/i) 4 

1 n ' 

jGda\i 



exp 



(5) 



For any given finite graph, Eqs. ^ provide a set of 2MK (two equations for each edge of the graph) coupled algebraic 
equations, the so-called belief propagation (BP) equations, which can in principle be solved on any individual instance. 
From these local fields one can compute the system free energy density / = F/N in terms of variable, test and edge 
contributions Q. In order to do that, let us consider an intermediate object, a factor graph made up by N variables 
and M tests where KL 'defective' variables are connected only with L—1 tests and KL 'defective' tests are connected 
only to if — 1 variables, while all other variables and tests have their natural degree (respectively L and K). We can 
now go from this intermediate graph to a well defined regular factor graph where each variable is connected to L tests 
and each test to K variables in two ways: 

i) We can either add K new items and L new tests and connect each of the item to L out of the LK defective 
tests and each new test to K out of the LK defective variables. In this way we obtain a regular hyper-graph 
made up by iV -I- X variables and M + L tests, all with their natural degree. 

ii) Or we can add LK new edges between pairs of defective variables and tests. In this way we obtain a regular 
hyper-graph made up by N variables and Af function nodes, all with their natural degree. 

In formulae we get: 



F{N + K) ~F{N) 



L 

Y^AF^ 



Fn 



(6) 



where Fq is the free energy of the intermediate graph, AFy is the free energy shift due to the addition of a new 

variable i, AF^ is the free energy shift due to the addition of a new test a, and AfJ'''''^ is the free energy shift due to 
the addition of a new edge between the item j and the test b. Supposing that the free energy scales linearly with the 
number of items, the previous relation allows us to determine the free energy density. 

The free energy shifts appearing in Eq. ([6]) can be written in terms of the restricted partition functions defined 
above: 
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The free energy shifts can be finally rewritten in the following way in terms of the local cavity fields: 



fiAF^^' 
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exp(-/i) + exp ^ nvb^i 

\bedi J 



1 - J]^ (1 - e^'*^-° 
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The expectation value of the number of active variables on each site i (also called marginals of the BP equations), 
can be obtained by deriving the free energy with respect to the chemical potential leading to: 

I ^ iV^exp(-/i)+exp(E,e9.Mt'6^0 ^' 



A. Replica Symmetric solution, entropy crisis and stability 

Eqs. ([5|) can be written for arbitrary graphs and they provide exact marginal probability distributions (and thus 
exact densities of active variables) only for loop-less trees. They are particularly suited for very large random hyper- 
graphs, where, due to the local tree-like structure, they are expected to provide exact results in the RS phase. 

The simplest hypothesis one can make is the so called Replica Symmetric (RS) Ansatz. Assuming that there is a 
single state describing the equilibrium behavior of the system, one can look for Jactorized Replica Symmetric solutions 
of the cavity equations, where all the local fields are equal on all the edges of the graph, i.e., /ii_a = hjis and 
Va^i = vjis, V(«,a). Within this approximation, Eqs. ([5]) reduce to: 



fJ-VBS = In < 1 



(10) 



The free energy shifts reduce to node and edge independent quantities and can be easily evaluated, along with all the 
thermodynamic observables. 

In Figure [5] the density of active variables p and the entropy density s = S/N = —fiF/N + fip are plotted as 
functions of the chemical potential /i for L = 2 and K = 6 (left panel) and for L = 6 and K = 12 (right panel). 
In this RS solution, p goes to 1/K as /i goes to infinity. One readily notes that while in the first case the entropy 
density stays finite even in the p, —^ oo limit (meaning that there is an extensive number of RS states with density 
of active variables equal to 1 /K satisfying the minimal covering of the hyper-graph) in the second case the entropy 
density becomes negative for chemical potentials larger than a certain threshold, Ps=o, (or, equivalently, for densities 
of active variables smaller that a certain value). 

These results can be easily understood by noting that in the p ^ oo limit the RS fields are simply given by: 

lihRs = -^-^^\n{K~l) 

pvRs = - | + |ln(if-l). (11) 
Therefore, according to Eq. ([9]), p = l/K, whereas the free energy density reads: 

^ CO, . I + '^-"'^.-"-' lnA- - '^-"f ln(A- - 1, (12, 

which immediately leads to: 

s{p ^ ^) = 1 {(L - l)(i^ - 1) HK - 1) - [(L - 1){K - 1) - 1] InK} . (13) 

In the large connectivity limit {K, L ^ 1), the entropy density scales as: 

s(/x^oo;L,i^> 1) - (14) 

Thus, s{p oo) is positive if Ini^ > L, while it is negative for L > InK. 

Clearly, the results presented above imply that the RS solution is wrong for p large enough, at least in the case 
L = 6 and K = 12. The RS solution turns out to be incorrect if the assumption of the existence of a single equilibrium 
state fails, meaning that fields incoming to a given node become correlated. To gain further insight, one can test the 
stability of this RS solution by computing the non-linear susceptibility, defined as: 



^E(^^^^)c- (15) 
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FIG. 2: Density of active items, p, and entropy density, s = S/N , as a function of the chemical potential, /x, in the RS solution 
of the HVC for L = 2 and K = 6 (left panel) and for L — 6 and K — 12 (right panel). The red dotted vertical line in the right 
panel corresponds to the point where the RS solution becomes unstable 



If X2 diverges the incoming cavity fields are strongly correlated and the RS assumption is inconsistent. (Notice that 
the divergence of the linear susceptibility corresponds to a modulation instability which is not compatible with the 
random nature of the graph). By using the fluctuation-dissipation theorem, one can relate the connected correlation 
functions between nodes i and j to the local cavity fields This finally leads to the following stability criterion: 



E 



j£da\i-b^dj\a 



dvb- 



< 1. 



(16) 



For the RS solution this criterion yields: 



V(L-l)(i^-l) 



< 1. 



(17) 



Using the previous equation, we find that for several values of L and K the RS solution becomes indeed unstable 
above a certain chemical potential. For L = 6 and K = 12 the point where the instability appears is marked on the 
right plot as a vertical dotted line. However, it is well known from the physics of glassy systems that the RS solution 
can be wrong because of a first order transition to a replica symmetry breaking solution, not detected by the stability 
argument. 



IV. IRSB CAVITY APPROACH 



Figure [2] clearly shows that the RS solution fails at high chemical potential and low density of active items (at least 
for L = 6 and K — 12), due to the fact that the hypothesis of the existence of a single state becomes inconsistent. 
Therefore, in this case other solutions must be found. In the following we employ a one-step Replica Symmetry 
Breaking (IRSB) approach [6| within the cavity method described in the previous section. More precisely, we assume 
that exponentially many (pure) states (in the size of the system) exist and that the neighbors of a given node, in the 
absence of the node itself, are uncorrelated only within each of these states. 

Local fields on a given edge can now fluctuate from pure state to pure state. The cavity methods provides a 
statistical description of the local fields in each state a, which must be weighted accordingly to their Boltzmann 
weight e~^^°' @. In order to deal with this situation, on each edge of the graph one has to introduce two probability 
distribution functions, Pi^a{h), and Qa—>i{v)'. Pi—,a{h) gives the probability density of finding the fields h^^a equal 
to h for a randomly chosen state (respectively, Q{v) gives the probability density of finding the field Va^i — v). By 
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using the cavity method, the following self-consistent integral relations are found [1, [T3| : 

bGdi\a 



(18) 



where and v{{hj^a}) enforce the local cavity equations, Eqs. ([5]), m is the usual IRSB parameter (2^ 

(fixed by the maximization of the free energy with respect to it), Zi and Z2 are normalization factors, and ^F^^^^"" 

and lS.F^^^^^ are the free energy shifts involved in the iteration processes, which take into account the reweighting 
factors of the different pure states. Using the relations: 



/iAF, 



iUdi\a _ Jj-^ + ^1 



V. ite 



aUda\i 
TAter 



In- 



]eda\i 1^0 + '^l 



(19) 



one obtains the expressions of the free energy shifts involved in the iteration processes in terms of the local cavity 
fields: 



A r^HJ8i\a , 



CXp(-/i) + CXp I ^ flVb^i 

b^di\a 



(20) 



-fiAF^ 



aUda\i 



= 0. 



In analogy with Eq. ([6]), from the local fields probability distribution, one can also compute the IRSB free energy 
of the system, as sum of contributions due to the addition of a new item, A(f)y, addition of a new test, Acjj^, and 



addition of a new edge between a variable and a test, A(j), 

K L 



a=l (j,6) 



(21) 



1=1 
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bedi 



where the free energy shifts AF^'^\ AF^;^^", and AF^''"^ are defined in Eq. dH]). 

In the IRSB formalism the IRSB free energy, (f>{fi,m) is given by (in the thermodynamic limit): 



(22) 



(23) 



Therefore, by Legendre transforming the IRSB free energy, we obtain the complexity E(/) (i.e., the logarithm of the 
number of states with free energy density equal to /) : 



dm ' 



d[m(j){ii, m)] 
dm 



(24) 



with /im = 9S(/)/9/. 
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On infinite random regular hyper-graphs one can assume that the IRSB glassy phase is translationally invariant, 
i.e., that the probability distribution of the local cavity fields are edge independent: for all edges a — i, Pi^a{h) = P{h) 
and Qa^i{v) = Q{v). In this case, Eqs. (fT8|) become two coupled integral equations for the probability distributions 
P and Q, and can be solved for any value of /i and m by means of a population dynamics algorithm [6]. 

At low /i one finds with this procedure that P{h) and Q{v) always converge toward the RS solution: starting 
from populations of fields with an arbitrary distribution, they converge to populations of identical fields such that 
P{h) = 5{h — hjis) and Q{v) — S{v — vjis), where has and v^s are the values of the local cavity fields which satisfy 
the factorized RS equation, Eq. PU)) . 

When n is increased, a first non-trivial distribution is found at = /x^ for m = I. At this point many states appear. 
The phase space splits into many clusters of solutions and, even though they are only metastable (the equilibrium 
state is still given by the RS solution at this point, since the maximum of the IRSB free energy occurs at to > 1), they 
could trap most of the algorithms and dynamical procedures which look for a covering pattern of a given hyper-graph. 
Thus, for a density of active items smaller than this "dynamical" threshold a survey propagation algorithm [7] should 
be used to find solutions of the HVC for finite instances. 

A static phase transition (which is only relevant at equilibrium) appears at higher chemical potential, fj, — fXc, where 
the maximum of the IRSB free energy is located in to = 1, the complexity vanishes 6, 14,, J^], and a thermodynamic 
transition from the RS phase to a IRSB glassy one takes place (for L — 6 and K = 12 we find that /i^ ~ 6.2 and 
A^c ^ 7). 



A. Minimal Hyper Vertex Cover 



Now we consider the minimal HVC problem. Namely, we still request that all the clauses are satisfied, but using 
the minimum possible number of active variables. This corresponds to the fj, —>■ oo limit in our statistical physics 
formulation. 

We thus consider the /i — > oo limit of the IRSB equations, Eqs. p^ . In this limit, according to Eqs. ([5]), it is 
self-consistent to assume that the local cavity fields hi^a and Va—,i can be either equal to minus one or to zero. In 
particular, the field Va-*i is equal to minus one if all the incoming fields /ij_,a are equal to minus one too, while it 
equals zero if at least one of the hj^a is zero. On the other hand, hi^a turns out to be equal to minus one if all 
the incoming fields Vb~,i are equal to zero, and it equals zero if at least one of the Vb—,i is minus one. Therefore, the 
probability distribution functions reduce to: 



P^^aih) = gr,''S{h + l)+gir-S{h) 



(25) 



(with the constraints wg^* -I- — 1 and gg^" -|- g'Ll" ~ 1) and the IRSB integral equations reduce to algebraic 
equations for the coefficients. For instance the second of Eqs. (fT8| becomes: 



1 - 



j^da\i 

where y = /xm. Analogously, for the first equation we have 



n {^-9r^ 



(26) 



bedi\a 



e-y + ii-e-y)lli 

Finally, after some algebra, the following system of equations for the coefficients ug^* can be obtained: 



(27) 



n 



n 



b&d3\a 



b^] 



(28) 



These equations are the ("zero temperature") survey propagation (SP)equations for the minimal HVC problem. 

In the case K — 2, one recovers the result found for the VC in Refs. [2, Q- We will see in the next section how to use 
them algorithmically in order to solve single instances. Assuming that the coefficients ug^' solving (^5)1 have been 
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FIG. 3: Complexity E as a function of the density of active variables p for L = 4 and K = 9,. For pcov < p < Ps the IRSB 
Ansatz is stable, pcov (where E = 0) is the minimal covering density. Below pcov the complexity becomes negative: is no longer 
possible to find solutions with smaller densities where all the clauses are satisfied and a COV/UNCOV transition takes place. 
The IRSB Ansatz is no longer stable for p > ps, where further breaking of the replica symmetry are expected to occur. 



determined, the IRSB free energy can be computed, according to Eqs. ([22|) : 

1 



= --In 



= --In 



1+ 1 - e- 



i-(i-e-y) TT 

^ ' e-y + (1 



n 



bedi\a "0 



■In 



e-y)R 

n 



bedi\a 
b£di\a "0 



(29) 



e-y + {l-e-v)ll. 



bedi\a ""o 



In the minimal HVC limit one has that —ycf) = Y. — yp. According to Eq.[24l the complexity S is recovered by Legendre 
transforming the function (j) via the relation: 



dy 

which leads to the following equation for the density of active items: 

dy 



(30) 



(31) 



On infinite random- regular hyper-graphs, the coefficientsuQ^' can be assumed to be edge- independent: Uq~*^ — uq. 
In this case the problem reduces to solving the single algebraic equation for the coefficient uq: 



.L-l 



Mo = 1 - 



se-y + (1 - e-y) M 
Once one has found the solution uq to this equation, the IRSB free energy simplifies to: 

The complexity S(/o), and the density p are then easily evaluated using Eqs. (|5D|) and pTjl . 



(32) 



(33) 
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In Figure [3] the complexity S is plotted as a function of p for L = A and K = The complexity vanishes at Pcou, 
corresponding to the minimal HVC density. At lower densities E is negative, implying that for p < pcou no solutions 
satisfying all the clauses can be found and a COV/UNCOV transition takes place. The complexity has a maximum 
at pd (given by d'^[y(j){y)\ — 0). The curve also displays a non concave part for y < yd, the physical interpretation of 
which has not yet been understood. 



B. Stability of the IRSB solution 



To determine whether the equilibrium state is really described by a IRSB solution or whether further replica 
symmetry breakings occur, one has to study the stability of the IRSB solution. The stability analysis of the RS 
Ansatz investigates if the RS state tends to split into exponentially many states. Since in the IRSB phase the Gibbs 
measure is decomposed in a cluster of different thermodynamic pure states [2^ . there are two different kinds of 
instabilities that might show up [l^, [l^l : i) either the states can aggregate into different clusters (in order to study 
this first instability one has to compute the inter-state susceptibility); ii) or each state can fragment in different states 
(in order to study this second instability the intra-state susceptibility must be computed). 

In the minimal HVC limit, the instability of the first kind can be easily studied by computing the eigenvalue of the 
(1x1) Jacobian matrix associated with Eq. ([28|) [l^ll^]- Since the linear susceptibility is related to a modulation 
instability incompatible with the underlying structure of the lattice, the non-linear (spin glass) susceptibility must be 
considered. Therefore, the criterion for the stability of the IRSB Ansatz simply reads: 



E 



j Gda\i;b£dj\a 



du, 



< 1, 



(34) 



which yields: 



v/(L-l)(if-l) 



il-uo)e-y 



uo [e-v + {1 - e-y) uj^ 



L-ll 



< 1. 



(35) 



To study the instability of second kind, instead, we consider a two-step RSB like Ansatz of the form Q[Q] ~ 
J2i ui6[Q{u) — 6{u — I)], and 7^[P] — J2i 9i^[P{^) ~ ^i^ ~ 01; where the IRSB states coincide with the 2RSB clusters 
and the 2RSB states reduce to single configurations. We want to compute the widening of SQ = J2m^i ^m{S{u — m) — 
5{u — I)), which can be written in terms of the widening of SP — '^^-i,, ^q{S{h — q) — S{h — r)), and check whether 
or not it grows under iteration. In order to do that we have to sum over all the perturbations of the cavity fields on 
the neighboring sites which change the configuration of a given site from / to m, according to their Boltzmann weight 
(for a general explanation see [Tsl. [23|): 



1 



E 



E 



{ii,...jir-l}eOa\i;(gJi-°,...,g'-ff-i-"')-^i 



a„(j'2-yi)A0°"®")'(g'i 



(36) 



A similar equation which gives the (7q)r in terms of the (em)i can be derived. The problem can then be rewrit- 
ten solely in terms of the coefficients uq and ui. Using a transfer matrix representation we find: (eo)-i — 
{L — 1){K — 1) X^b c=o -i f)5^c^(o.-i)(ft,c)(£b)c- Therefore IRSB solution is stable provided that the eigenvalue of 
T of largest modulus. A, is such that (L — 1){K — 1)A < 1. The transfer matrix T reads (without losing of generality 
we can set yi — y2 — y)'- 





LK-L-K 
^0 



(1-^io) 



(37) 



where the normalization is given by Z 
solution is given by: 



-y 



(1 



,L-liA'-l 



Therefore, the stability criterion of the IRSB 



{L-l){K-l) 



I _„ 2{L-l)(K-l)-l,^ 7 



[e-y + (1 -e-^'X 



(38) 
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FIG. 4: Phase diagram of the Minimal Hyper Vertex Covering problem. Black squares, red circles and blue triangles correspond, 
respectively, to the values of L and K for which the minimal HVC configurations are described by a Replica Symmetric Ansatz, 
a 1-step Replica Symmetry Breaking Ansatz, and a higher order Replica Symmetry Breaking Ansatz. As already shown in 0), 
for K — 2 the solution of VC exhibit higher order RSB for every value of L. 

For some values of L and K, according to the stability criterions given in Eqs. ([M]) and we find that the IRSB 
solution is stable around the COV/UNCOV transition, and therefore the threshold is likely to be exact. On the 
contrary, for other values of L and K the IRSB approach is unstable, and a higher order Replica Symmetry Breaking 
transition is expected to occur. In this case a more involved analysis would be required to locate the COV/UNCOV 
threshold (the IRSB result is expected to provide a lower bound). This happens, for instance, for K = 2, where the 
results already found for the standard VC problem are fully recovered 

The phase diagram of the system as a function of the item and test degrees L and K is presented in Figure IH 
showing the relative positions of the different phases in the minimal covering limit. 

V. SURVEY PROPAGATION AND SURVEY INSPIRED DECIMATION 

We have already mentioned the possibility that, in analogy with ii'-satisfability and other optimization problems, 
the BP and SP equations, Eqs. ^ and ([26|l -(|27 |) . may provide efficient algorithmic tools to find solutions of the HVC 
problem on large instances QjllO]. In the present section, taking advantage of the analytical investigations presented 
above, we test numerically the efficiency of these message passing algorithms on large samples (for several values of 
the connectivity), and compare their performance to a simple heuristic covering algorithm. We show that BP and 
SP provide an efficient way to solve the HVC, being able to find covering patterns whose sizes are very close to the 
minimal one. Moreover, in the region of K and L where a 1-step RSB occurs, we show that an SP algorithm allows to 
improve the BP results, and to find solutions of the HVC extremely close to the COV/UNCOV threshold predicted 
by the theoretical analysis. 

In the following, we briefly describe the three algorithms used, and finally we present the results found. 

A. Belief Propagation (BP) Algorithm 

The BP algorithm simply consists in finding by iteration a solution of Eqs. ([5]) on a given factor graph, and then 
applying an iterative decimation procedure which, at each step, polarizes the most biased variables, until all the 
variables are fixed. Thus, the BP algorithm works by iterating the three following steps: 

1. Solve the BP equations, Eqs. (O, on the graph, until all the messages converge to a fixed point. 

2. Compute the marginals acting on each variable (i.e., the probability of being zero or one), and we polarize the 
most biased variable, by assigning to it the most probable value. 
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3. Generate the new reduced graph, by removing the variable that has been polarized and all the edges incoming on 
it. In the case where the variable has been polarized to one, we also remove all the tests to which it is connected 
(since they are automatically satisfied) and all the edges incoming on those tests. 

There is a problem concerning the convergence of BP that has to be mentioned. In particular, as discussed in the 
previous sections, depending on the value of K, L and fj,, the entropy of the RS solution of the HVC (i.e., the number 
of solutions of the BP equations) can be either positive or negative. In order to make the BP equations converge, 
we fix the value of ^ in such a way that RS entropy of the problem is positive. As a matter of fact, while the BP 
algorithm is iterated, the decimated graph modifies and consequently the entropy associated to the problem defined 
on the reduced graph changes. As a consequence, it can occur that the BP equations do not converge on the reduced 
graph, after that some variables have been fixed. In order to overcome this problem, during the decimation procedure 
we tune the chemical potential in such a way that the RS entropy is always kept positive. (Another possible and 
equivalent strategy consists in choosing at each decimation step the largest value of for which the BP equations 
converge to a fixed point). 

B. Survey Propagation (SP) Algorithm 

In some regions of the phase space the BP equations possess a high number of solutions (corresponding to different 
thermodynamic states), and none can be found using a local iterative updating scheme. BP works well if a single 
cluster of minimal HVC exists. However, a breaking of the replica symmetry implies the emergence of clustering in 
the solution space. This effect is captured by the SP algorithm, as first proposed in [13], which describes the statistics 
over all the solutions of the BP equation, by taking into account their thermodynamic weight. Basically, the SP 
algorithm is very much alike the BP one, and it consists of the same three steps as before. The only difference is 
that the SP messages, Eqs. (pS)) and ([?7|) . must now be used instead of the BP ones. In order to have a minimal 
HVC covering of the factor graph, we would like to set the value of y to that corresponding to the COV/UNCOV 
transition, where the 1-RSB free energy has a maximum and the complexity vanishes. However, it can occur that, 
once that some variables have been fixed, the SP equations stop to converge on the reduced graph for that value of y. 
This is due to the fact that while the decimation is carried on, the graph and, consequently, the complexity change, 
and the value of y one was using may now fall into the uncoverable region of the decimated problem. Therefore, in 
order to overcome this problem, after some decimation steps we recompute the complexity of the decimated problem 
defined on the reduced graph, and we tune the value of y to that corresponding of the new COV/UNCOV threshold. 

C. Greedy Algorithm 

Here we present a very simple heuristic algorithm which allows to find a covering pattern of the hyper-graph. The 
algorithm consists of the following steps: 

1. We pick up the variable with higher degree and we set it to one. We then remove from the graph this variable 
and all the edges incoming on it. We also remove all the tests connected to that variable (since they are satisfied) 
and all the edges incoming on those tests. We repeat this procedure until there are no variables left with degree 
larger than zero, and no more tests. 

2. We fix to zero all the remaining variables, which arc all isolated. 

D. Results 

We have tested these three algorithms on large instances for several values of L and K. It turns out that in general 
both BP ans SP perform much better than the greedy algorithm, and are able to find efficiently solutions of the HVC 
very close to the minimal COV/UNCOV threshold predicted by the analytical calculations (in general the size of the 
solutions provided by BP and SP is always less than few per cents larger than the minimal one). In particular, for 
values of L and K for which a 1-step RSB transition occurs (see Fig. [3]), SP is able to improve the BP result. Just to 
give a more quantitative idea of the performance of the different algorithms, in Fig. [5l we present the results of the 
numerical tests of the three algorithms for L = 4 and K = 6. 



13 



H >► 

p 

gr 



FIG. 5: Sizes of the covering of the hyper-graph obtained by the BP, the SP and the greedy algorithm for a HVC problem with 
L = 4 and = 6, for 12288 variables and 8192 tests. The COV/UNCOV threshold in this case is equal to p^ov ^ 0.178; The 
SP algorithm allows to find solutions of size psp — 0.182; The BP algorithm generates coverings of size pBP — 0.186. Finally 
the greedy algorithm gives solutions of size pgr — 0.212. 



VI. DISCUSSION AND CONCLUSION 

In this paper we have studied the statistical mechanics of a generalized Vertex Covering problem on the hyper- 
graph, which might have many practical applications. As an example, the problem of Group Testing is deeply related 
to the HVC and, the knowledge of the phase diagram of the latter, could give important insights on how to devise an 
efficient reconstruction algorithm for the former, depending on L, K and on the density of active items. 

The minimal HVC has been studied in great detail. For low enough degree of the items (L = 2, L = 3 and K > 7, 
L = 4 and K > 21, . . .) we find that the RS solution describes correctly the minimal covering configurations. For 
bigger values of L, the RS Ansatz becomes inconsistent, whereas the IRSB solution is stable and is likely to provide 
the correct solution around the COV/UNCOV threshold. Both in the RS and in the IRSB region we have found 
explicit results on the minimal density of active items required to cover the factor graph, and on the structure of the 
phase space. On the other hand, there are also cases (e.g., the ordinary VC problem [2|, [SJ, K = 2 and L > 3) where 
further RSB steps are required. In these cases IRSB approach becomes inconsistent and higher RSB patterns are 
required. 

Finally, we have shown that a decimation procedure based on the BP and SP equations turns out to be a very 
efficient strategy to solve large individual instances of the HVC problem. 
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